Speech analyzer-synthesizer system employing improved formant extractor

ABSTRACT

An important step in speech signal analysis is the identification of formant frequencies of voiced speech. Formant data is necessary in the synthesizer used, for example, in a resonance vocoder. To derive these data, i.e., to obtain an estimate of the pitch period of the signal and its spectral envelope, a cepstrum of a speech signal is used. The lowest three formants of a voiced speech signal are then estimated from a smoothed spectral envelope using constraints on formant frequency ranges and relative levels of spectral peaks at the formant frequencies. These constraints allow detection in cases where formants are too close together to be resolved from the initial spectral envelope.

finite ttes Patent Rabiner et al.

[45] Mar. 14, 11972 [54] SPEECH ANALYZER-SYNTHESEZER SYSTEM EMPLOYINGHMPROVED FORMANT EXTRACTOR [72] Inventors: Lawrence R. Rabiner, Chatham;Ronald W. Schaier, New Providence, both of NJ.

[211 App1.No.: 872,050

3,493,684 2/1970 Kelly 179/15A 3,190,963 6/1965 David..... 179/1 5A3,268,660 8/1966 Flanagan 179/1 5A Primary Examinerl(athleen H. ClaffyAssistant Examiner-Jon Bradford Leaheey Attorney-R. J. Guenther andWilliam L. Keefauver [57] STRACT An important step in speech signalanalysis is the identification of formant frequencies of voiced speech.Formant data is necessary in the synthesizer used, for example, in aresonance vocoder. To derive these data, i.e., to obtain an estimate ofthe (g1. pitch period of the signal and its spectral envelope, a cepstmm[58] Field 5 A 15 55 of a speech signal is used. The lowest threeformants of a v0- iced speech signal are then estimated from a smoothedspectral envelope using constraints on formant frequency ranges [56]References cued and relative levels of spectral peaks at the formantfrequen- UNITED STATES P S cies. These constraints allow detection incases where formants are too close together to be resolved from theinitial 2,938,079 5/1960 Flanagan ..179/ 15.55 Spectral enve]ope3,328,525 6/1967 Kelly .l79/15.55 3,448,216 6/1969 Kelly 179 /1 55 p 9Claims, 11 Drawing Figures 13 I4 26 :25 ZERO PITCH P: to r P F CROSSINGMOD NO'SE COUNTER FDETECTOR GENERATOR P, 1 AN 23 F1 24 WINDOW UNVOICEDUNVOICED FUNCTION SPECTRUM RESONANT GENERATOR CODER CIRCUITS l5 16 m m1:: F1 30 32 4 Q 2 3| SPECTRAL FEM 3 FIXED WENVELOPE GATE 2- 2 ADDSPECTRAL ESTIMATOR *3 SHAPING T; F F 3 i 5 27 .C (nT) A VOICED l VOICEDSPECTRUM RESONANT c '9 ODER F CIRCUITS 20 3 29 BUZZ/HISS m v LEVEL opPULSE CONTROL 1 GENERATOR T T Av PATENTEDMAR 14 1972 SHEET 2 [1F 5 FIG.2 sEEcTRAL EHVELOPE *ESETIMATOR l q C(HT) HT SPECTRAL I ENVELOPE 52DISCRETE 5|GNAL MOD. ADD FOURIER CEPSTRUM f TRANSFORMER I nT e nT T T Tc(n )Mn )+e(n) lzERo CROSSING 35 (0 I) PITCH 1 DETECTOR /COUNT mcoMPAREy 36 FZKLOGIC OR 39 34 3a My VO|CED CEPSTRUM [WW COMPARE o= UNVOICEDPEAK M PICKER 'GATE CEPSTRUM PEAKS OF SPECTRAL ENVELOPE SIGNALS(FREQUENCIES AND LEVELS) FIG. 4

Pp FREQUENCY OF HIGHEST PEAK ABOVE I000 HZ Ap l3db E =500 HzPAIENIEDIIIIR 14 m2 FROM CE PSTRUM ANALYZER SHEET 3 [IF 5 0 T0 900 HZ INFl REGION ENHANCE REGION FI=HIGHEST PEAK FIAMP=FOAMP-8.7 db

' ARE NOT RESOLVED Fl HAS BEEN PICKED Fl AND PEAK DUE TO SOURCEFI=LOCATION OF HIGHEST PEAK IN FI REGION FIAMP=LEVEL OF PEAK IFOAMP=LEVEL OF THE HIGHEST PEAK IN THE RANGE 0 T0 900 HZ PEAKS INSPECTRAL ENVELOPE (FREQUENCIES & LEVELS) FROM CEPSTRUM ANALYZE R SEARCHREGION FL T0 F2MX F2=LOCATION OF HIGHEST I PEAK FOR WHICH FIAMP-FZAMPEXCEEDS THE THRESHOLD OF FIG. 9

ENHANCE REGION FI-450 T0 1 FI+450 HZ FI=HIGHEST LEVEL PEAK IN Fl REGIONFZ SECOND HIGHEST LEVEL PEAK F2=FI +200 4 NO F2 FOUND? YES ' THRESHOLDFOR F3 PEAK= H138 CII? THRESHOLD FOR F3 PEAK= I000 L Fl AND F2 ARE NOTRESOLVED FIG. 5

FIG-6 FlG.7

PATENTEBHARM I972 3,649,765

SHEET H [1F 5 FIG. 7

Fl AND F2 HAVE BEEN PICKED FL=F2MN FL=F3MN FROM CEPSTRUM SEARCH REGIONFL TO F3MX ANALYZER F3 -LOCATION OF HIGHEST PEAK FOR WHICH F2AMP-F3AMPEXCEEDS THRESHOLD SET DURING F2 SEARCH ENHANCE REGION N0 F2 -450 T0 F3FOUND? 1 F2 +450 Hz T YES FP HIGHEST PEAK F3 SECOND HIGHEST PEAK ALLFORMANTS ESTIMATED SPEECH ANALYZER-SYNTHESIZER SYSTEM EMFLOYING IMPROVEDFORMANT EXTRACTOR This invention relates to the analysis and synthesisof speech in bandwidth compression systems. Subordinately, it relates tothe identification and extraction of formants from continuous humanspeech.

BACKGROUND OF THE INVENTION In order to make more economical use of thefrequency bandwidth of speech transmission channels, a number ofbandwidth compression arrangements have been devised for transmittingthe information content of a speech wave over a channel whose bandwidthis substantially narrower than that required for analog transmission ofthe speech wave itself. Bandwidth compression systems typically include,at a transmitter terminal, an analyzer for deriving from an incomingspeech wave a group of narrow bandwidth control signals representativeof selected information-bearing characteristics of the speech wave and,at a receiver terminal, a synthesizer for reconstructing from thecontrol signals a replica of the original speech wave.

1. Field of the Invention It has been demonstrated that a speechwaveform can be constructed by means of an arrangement that correspondsgenerally to the structure of the human vocal tract. Speech is producedin such an arrangement by exciting a series or parallel connection ofresonators either by random noise, to produce unvoiced sounds, by aquasi-periodic pulse train, to produce voiced sounds, or in some casesby a mixture of these sources, to produce voiced fricatives. To producenatural sounding speech, the mode of operation of the human vocal tractis simulated by continuously tuning the natural frequencies of theresonators. As tuned, resonances are established at selected frequenciesto produce peaks or maxima in the amplitude spectrum of thereconstructed signal which correspond to the principal resonances, orformants, of the human vocal tract. Since the first three formants, inorder of frequency, contribute most to the intelligibility of speech, itis common practice to transmit at least three formant control signals toshape an artificial spectrum at the synthesizer.

2. Discussion of the Prior Art Since formants are effective parametersfor the production of artificial human speech, they are used as controlsignals, for example, in such devices as the wellknown resonancevocoder. A typical resonance vocoder is described in J. C. Steinberg,U.S. Pat. No. 2,635,146, issued Apr. 14, 1953. Further, since thequality of speech reconstructed by a resonance vocoder or the like islargely dependent on the proper identification of formant frequenciesand locations, a number of techniques have been proposed for extractingformant information from a speech wave. One such proposal is describedin J. L. Flanagan, U.S. Pat. No. 2,938,079, issued May 24, 1960.Further, electrical methods for speech synthesis, using formant data,are discussed in detail in Speech Analysis, Synthesis and Perception byJ. L. Flanagan, Academic Press, lnc., 1965.

SUMMARY OF THE INVENTION It is an object of this invention to improvethe accuracy and efficiency with which formants are derived from aspeech signal. It is another object to use these forrnants and otherselected parameters to transmit, over a narrow band communicationcircuit, sufficient information with which to produce an accuratereplica of an input speech signal.

These and other objects are achieved, in accordance with this invention,by determining, at a transmitter station, as a function of time, thepitch period, the amplitude of voiced and unvoiced excitation, thelocation of the lowest three formants for voiced speech, and thelocations of a single pole and zero necessary for the synthesis ofunvoiced speech. These data are suitable for transmission to a receiverstation for use in the synthesis of speech. Since the system is notpitch-synchronous,

an exact determination of pitch period is not required. Instead, severalperiods of speech may be examined at a time. Averaging of this sort hasthe advantage of eliminating the difficult problem of accuratelydetermining pitch periods in the acoustic waveform.

The analysis of applied voiced speech thus involves two basic parts,viz, initially, an estimation of pitch period and a computation of thespectral envelope of the applied signal, and, secondly, an estimation offormants from the spectral envelope. Estimation of the pitch period andthe spectral envelope is accomplished through a computation of thecepstrum of a segment of the applied speech waveform. The cepstrum of asegment of sampled speech is defined as the inverse transform of thelogarithm of the Fourier transform of that segment. Cepstral techniquesfor pitch period estimation have been described in Cepstrum PitchDeterminations by A. M. Noll, Journal of the Acoustical Society ofAmerica, February, 1967, at page 293. Previous investigations have shownthat it is reasonable to assume that the logarithm of the Fouriertransform (actually the logarithm of the z-transform in the case ofsampled date) of a segment of voiced speech consists ofa slowly varyingcomponent attributable to the convolution of the glottal pulse with thevocal tract impulse response, plus a rapidly varying periodic componentdue to the repetitive nature of an acoustic waveform. These two additivecomponents can be separated by linear filtering of the logarithm of thetransform. The assumption that the log magnitude is composed of twoseparate components is supported by investigation of models of theproduction of speech waveforms.

Accordingly, the pitch period is determined by searching the cepstrumfor a strong peak in a region encompassing the minimum expected pitchperiod. The spectral envelope is obtained by low pass filtering of thelog magnitude of the discrete Fourier transform. Formants are derivedfrom the smoothed spectral envelope by locating all of the peaks(maxima) and identifying the location and amplitude level of each peak.This collection of peak locations and peak levels contains the spectralinformation necessary for a satisfactory estimation of formant values.The frequency region expected to contain the first three formants of aspeech signal is then segmented into three regions. The lowest formantissearched for first, looking primarily in the lowest region, then thesecond formant is sought, primarily in the next highest region, andfinally the third formant is searched in the highest of the threeregions. Based on the amplitudes and frequencies of the peaks and theirlocations in the various regions or in regions of overlap, logicaloperations are performed by which spurious candidates are eliminated andthe selected highest peaks are ordered and identified as speechformants. If the speech is unvoiced, only a single variable resonancepeak and a single variable antiresonance are used to characterize thesound. They, too, are extracted from a cepstrally smoothed spectrum. Avoiced-unvoiced decision additionally is obtained based on the presenceor absence of a strong peak in the cepstrum together with a measure of azero crossing count.

In order to convert the control parameters of the analyzer to speech, adigital, serial, terminal analog speech synthesizer is employed. Itmodels the transmission characteristic of the V vocal tract from glottisto mouth. Synthesizers based on such models have beendescribed'previously in the art, for example, in Gerstman-Kelly, U.S.Pat. No. 3,l58,685, issued Nov. 24, 1964, as well as elsewhere. Thevariable resonance circuits employed in the synthesis network and themanner of controlling them may be substantially identical to thosedescribed in the Gerstman-Kelly patent.

Certain other refinements to the generation of parameter signals areemployed to improve the synthesis of speech, particularly in those casesin which formants in the applied speech are too close together infrequency to be resolved.

This invention will be more fully understood from the following detaileddescription taken together with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a block schematic diagram ofa speech analyzersynthesizer which illustrates the principles of thisinvention;

FIG. 2 illustrates the structure of a spectral envelope estimatorsuitable for use in the system of FIG. 1;

FIG. 3 depicts a pitch detector which may be used in the practice of theinvention;

FIG. 4 illustrates the functional operation of unvoiced spectrum coder18 used in the apparatus of FIG. 1;

FIG. 5 illustrates the manner in which FIGS. 6 and 7 are interconnected;

FIGS. 6 and 7 illustrate by way ofa functional flow chart the operationof voiced spectrum coder 19 used in the analyzer of FIG. 1;

FIG. 8 depicts typical regions in the spectrum of a speech signal likelyto contain form ants;

FIG. 9 illustrates the threshold level of signal F relative to signalF,, useful in explaining the operation of a voiced spec trum coder;

FIG. 10 illustrates a characteristic cepstrally smoothed log spectrum ofa speech signal. and

FIG. 11 illustrates the manner in which formants in the log spectrum ofthe signal of FIG. 10 are emphasized by virtue of the operation of theapparatus of this invention.

DETAILED DESCRIPTION OF THE INVENTION FIG. 1 illustrates a bandcompression system including an analyzer at a transmitter station, and asynthesizer at a receiver station, which illustrates the principles ofthe invention. At the analyzer, an incoming speech wave from source 10,which may be a conventional transducer for converting speech sounds intoa corresponding electrical wave, is applied both by way of modulator 11to cepstrum analyzer 12, and to zero crossing counter 13. The purpose ofthe analyzer station is to develop control signals representative of thepitch period and formant locations for voiced speech, the resonance andantiresonance locations for unvoiced speech, and an indication of themagnitude of the buzz or hiss components during voiced and unvoicedspeech intervals, respectively. A cepstrum analysis is particularlysuitable for this purpose since it permits ali of these parametersignals to be developed with a minimum of equipment complexity. Thus.estimation of the pitch period and the spectral envelope of the appliedsignal is accomplished from the computation of the cepstrum of a segmentof the speech waveform. As discussed by Noll, the cepstrum of a signalis the spectrum of the logarithm of the power spectrum of a signai andexhibits a number of distinct peaks at pitch period intervals. Previousinvestigations have shown that the logarithm of the Fourier transform ofa segment of voiced speech consists of a slowly varying componentattributable to the convolution of the glottal pulse with the vocaltract impulse response, plus a rapidly varying periodic component due tothe repetitive nature of the acoustic waveform. These two additivecomponents. available in the cepstrum signal, may be separated by linearfiltering.

Preparatory to developing the cepstrum of the applied signal, a segmentof input speech, .r(T+nT), is weighted, through the action of modulator11, by a symmetric Hamming window function, u(nT), such that wheredenotes a discrete convolution, where T is the starting sample of thesegment of the speech waveform, and where T is the sampling period inseconds. In equation l p(T+n T) represents a quasi-periodic impulsetrain appropriate for the particular segment being analyzed and h(nT)represents the triple convolution of the vocal tract impulse responsewith the glottal pulse and the radiation load impulse response. Thewindow function w(nT) tapers to zero at each end to minimize the effectsof a nonintegral number of pitch periods within the window. Since thewindow function varies slowly with respect to variations in the pitch ofthe applied signal, it is convenient to develop it, in functiongenerator 23, from the indication of pitch period developed by pitchdetector 14. Thus, the purpose of modulating the applied speech wavefrom transducer 10 by the window function in modulator 11 is to improvethe approximation that a segment of voiced speech can be represented asa convolution ofa periodic impulse train with a time invariant, vocaltract impulse response sequence. Preferably, the window function isspecified by the equation:

0.54 O.46 cos. 21mT/31-0 s nT s 31' W t2) 0 elsewhere. The duration,31-, of the window is three times the previous estimate of pitch period.It is made dependent on the pitch period estimate, from detector 14, fortwo conflicting reasonsv In order to obtain a strong peak in thecepstrum at the pitch period, it is necessary to have several periods ofthe waveform within the window. In contrast, in order to obtain strongpeaks in the smooth spectrum, only about two periods should be withinthe window, i.e., formants should not have changed appreciably withinthe time interval spanned by the window. Thus, an adaptive width windowassures better estimates of pitch and formants since it presents a widerwindow for finding a strong peak at the pitch period, and a narrowerwindow for finding strong, unambiguous indications of formants. Thechoice for window duration of three times the previous pitch periodrepresents a compromise which has proven to be satisfac tory.

As noted earlier, the cepstrum developed at the output of analyzer 12consists of two components. The component due primarily to the glottalwave and the vocal tract is concentrated in the region lnTl r, while thecomponent due to the pitch occurs in the region .lnTlz-r, where r is thepitch period during the segment being analyzedv The component due toexcitation consists mainly of sharp peaks at multiples of the pitchperiod. Thus, pitch period can be determined by searching the cepstrumfor a strong peak in the region nT 1-,,,,,,, where 1', is the minimumexpected pitch period. Signals from analyzer 12 are accordingly suppliedas one input to pitch detector 14. Zero crossing count informationdeveloped by counter 13 is supplied as the other. This information isemployed to provide an indication of the voiced or unvoiced character ofthe applied speech signal. Detector 14 produces a signal P, which mayeither be equal to 1- for voiced signals, in which case 1' denotes thepitch period of the input signal, or zero for unvoiced signai. Detailsof a suitable pitch detector are described hereinafter with reference toFIG. 3.

Similarly, a suitable examination of the cepstrum from analyzer 12 isperformed to develop an estimate of the spectral envelope of the appliedsignal. Although a variety of techniques for deriving such an envelopesignal are known in the art, one suitable arrangement is describedhereinafter in the discussion of the arrangement of PK]. 2.

Peaks in the spectral envelope are identified in peak picker network 16.Suitable peak picking networks have been described variously in the art.Peaks of the spectral envelope are delivered by way of gate 17 either tounvoiced spectrum coder 18 or to voiced spectrum coder 19. The choice isdependent upon whether the input speech signal is voiced or unvoiced.Accordingly, gate 17 is actuated by the voiced-unvoiced signal characterof the pitch period signal developed by detector 14. If the input signalis voiced, values of 1' which appear as a l signal at the input of gate17, open the gate so that peaks of the spectrum envelope are supplied tocoder 19. If the input signal is unvoiced, a 0" pitch signal (absence ofr) is applied use in synthesizing the applied wave. Two control signals,F and F are developed by coder 18, indicating for unvoiced speech thelocation ofa single resonance and antiresonance in the speech signal,and three control signals, F F and F are produced by coder 19,representative of the location of the first three formants of theapplied signal. Coder 19, in addition to operating on the peaks of thespectrum envelope, also is supplied with cepstrum signals from analyzer12.

Control signals A, and A representative of the level of buzz and hisssignals to be used in synthesis, are developed in control network fromthe first spectrum signal produced by cepstrum analyzer 12. Apparatusfor developing such level control signals are well known in the vocoderart; any form of buzz-hiss level analyzer may be employed.

Signals P, F F F F F and A and A constitute all of the controlsnecessary for characterizing applied speech, both when voiced andunvoiced. These signals together require considerably less transmissionbandwidth than would analog transmission of the applied speech signalAccordingly, they may be delivered to multiplex unit 21, of any desiredconstruction, wherein the group of control signals is prepared fortransmission to a receiver station. At the receiver station distributorunit 22, again of any desired construction, recovers the transmittedsignals and makes them available for synthesis.

Received parameter signals may be used to control the production ofartificial speech, using any well-known synthesis apparatus. Forexample, a formant vocoder synthesizer of the form described in theabove-mentioned Gerstman and Kelly US. Pat. No. 3,158,685, issatisfactory. Typically, a formant vocoder synthesizer includes twosystems of resonant circuits, one energized by a noise signal to produceunvoiced sounds, and the other energized by a periodic pulse signal todevelop voiced sounds. In the illustrated apparatus, unvoiced resonantcircuits 24 receive noise signals from generator 25 by way of modulator26. The modulator is controlled by the hiss level control signal A, andserves to control the amplitude of noise signals supplied to the inputof the resonant circuits. Spectrum signals F and F tune the resonantcircuits 24 to shape the noise signals.

Voiced resonant circuits 27 are supplied, by way of modulator 28, withsignals from pulse generator 29. Pulse generator 29 responsive tocontrol signal P, develops a train of unit samples with the spacingbetween samples equal to "r, where r is the value of P during voicedintervals. Such pulses are similar to vocal pulses of air passingthrough the vocal chords at the fundamental frequency of vibration, l/r,of the vocal chords. The amplitude of the resulting pulse train iscontrolled in modulator 28 by buzz level control signal A Signal Arepresents the intensity of voicing. Resonant circuits 27 thus energizedare controlled by formant control signals F,, F and F to shape the trainof pulse signals in a fashion not unlike the shaping of voicedexcitation that takes place in the human vocal tract, and to producevoiced signals which correspond to those contained in the input signal.In the conventional manner, resonant system 27 includes additional fixedresonant circuits to provide high frequency shaping of the spectrum.

Voiced and unvoiced replica signals from circuits 24 and 27 are combinedin adder 30 and delivered for use, for example, to energize loud speaker31. Additional spectral balance for the synthetic speech signalspreferably is obtained by passing the signals from adder 30 throughfixed spectral shaping network 32 before delivering them for use. Thisrefinement aids in restoring realism to the reconstructed speech.

A form of spectral envelope estimator 15, suitable for use in thepractice of the invention, is shown in FIG. 2. Low pass filtering of thecepstrum signal c(nT) is accomplished by first multiplying the suppliedcepstrum by a function 1(nT) of the form where r, AT is less than theminimum pitch period that will be encountered. The sequence e(nT) isnext added to the sequence c(nT)l(nT). The purpose of adding thiscomponent to the cepstrum is to equalize formant amplitudes. The

sequence e(nT) consists of four nonzero values, as follows:

4 Functions [(nT) and e(nT) may be produced, respectively, by functiongenerators 51 and 53, constructed to evaluate the above equations.Function generators suitable for making such evaluations are well knownin the art. The signal from function generator 51 is applied tomodulator 50 and the signal from function generator 53 is added to theresultant signal in adder 52. The sequence. c(nT)I(nT) e(nT) thentransformed, in discrete Fourier transformer 54, of any wellknownconstruction, to produce an equalized spectral envelope.

Since the component of the cepstrum due to voiced excitation consistsmainly of sharp peaks at multiples of the pitch period, the pitch periodof the applied speech wave can be determined by searching the cepstrumfor strong peaks in the region of the minimum expected pitch period. Asuitable manner of doing this is shown in the detailed illustration ofpitch detector 14 by way of FIG. 3. A zero crossing count from counter13 (FIG. 1) is supplied to compare network 34, where the total count ismatched to a threshold signal, typically with a value of 1500 crossingsper second. If the count is above the threshold, a signal, Y=O isdelivered to logic OR gate 36. If the count is below the threshold, asignal, Y=I is delivered to gate 36. Cepstrum signals from analyzer 12are delivered to peak picker network 37, which may be of the typedescribed by Noll in US. Pat. No. 3,420,955, issued Jan. 7, 1969, or ofany other desired form of construction. Cepstrum peaks are then comparedin network 38 against a threshold established symbolically bypotentiometer 39. If the amplitude of the detected peak is greater thanthe threshold, the comparator issues a signal X=1 to indicate that avoiced signal is present (because of the presence of a pitch periodsignal), but if the peak amplitude is below threshold, 2 signal X=O isdelivered to logic OR gate 36. Peak signals from peak picker 37 are alsodelivered to gate 40. Gate 40 is COlllIPllfid by the output of OR gate36 such that a cepstrum peak signal above threshold, or a zero crossingcount signal below threshold, indicates a voiced signal. Gate 40thereupon permits the peak location signal from picker 37 to bedelivered as an output signal. It is designated P='r. If neither of thethreshold criteria are met, logic OR gate issues a zero, gate 40 is notactuated, and no signal appears at the output of the gate. Thisconstitutes the signal P=O and indicates that the applied signal isunvoiced.

From the derived peaks in the spectral envelope of the applied signal,it is in accordance with the invention to develop both signals forcontrol of unvoiced resonant circuits at a synthesizer, and signalsrepresentative of the formant frequencies and locations for use in thecontrol of voiced resonant circuits at the synthesizer. If the speech isunvoiced, as indicated by the P=O signal from pitch detector 14 appliedto gate 17, then only a single variable resonance peak is used tocharacterize the sound. It has not been found necessary to estimate asecond unvoiced resonance in order to synthesize unvoiced sounds. Theresonance peak for unvoiced sounds is extracted from peaks in thespectral envelope in coder 18. Since there is no pitch period for thesesounds, a fixed number ofdata points is analyzed. The resonance peakused is the strongest spectral peak about 1,000 I-Iz. Although coder 18may be implemented in any desired fashion to select and process thedesired spectral peak, it has been found convenient to employ a specialpurpose computer programmed, for example, in accordance with the flowchart of steps shown in FIG. 4.

As indicated in FIG. 4, peaks of the spectral envelope signal deliveredto coder 18 are processed by defining the frequency of the highest peakabove 1,000 I-Iz. as F The difference between F and the incoming signalis set equal, in Z-transform notation (discussed hereinafter), to

Ap= l t l I l ")l- (5) If A is found to be greater than 13 db., F isassumed to be 500 cycles and is determined. If A is not greater than 13db. above the reference, but is less than db. below the reference, F isassumed to be equal to F,. and F is deter mined. If F meets neithercriteria, it is set equal to F =(0.0065 F +4.5 Ap)(0.014 F +28), (6)

and zero in the unvoiced spectrum. are available for use at I thesynthesizer in adjusting unvoiced resonant circuits 24. A suitableprogram listing for carrying out these operations ,on a computer is setforth in Appendix I, attached to this specification.

Before proceeding to the details of the process for estimating theformant frequencies from peaks in the spectral envelope, in coder 19, itis believed helpful to present data relating to the properties of thespeech spectrum. FIG. 8 shows the frequency ranges of the first threeformants as determined from experimental data. Individual speakers mayhave formant ranges somewhat different from those shown in the figureand, if known, these ranges may be used for that speaker. It is apparentthat there is a high degree of overlap lbetween ranges in which formantsmay be located. The first formant range is from 200 to 900 Hz. However,for approximately one-half of this range (500-900 Hz.) the secondformant can overlap the first. Simultaneously, the second and ithirdformant regions overlap from l,l0O-2,700 Hz. Thus, the lestimation ofthe formants is not simply a matter of locating ipeaks of the spectrumin non-overlapping frequency bands. Another property of speech pertinentto formant estimation is the relationship between formant frequenciesand relative amplitudes of formant peaks in the smooth spectrum.Considerable importance, therefore, is placed on a measurement of thelevel of the second formant peak (F relative to the !level of the firstformant peak (F,). The level measurement A is defined, again inZ-transform notation, as: I A log mo e log wo en I. 7) where F, and Fare the frequencies of the first and second for- :mants, lH(e I is themagnitude of the smoothed spectrum at F Hz. A careful analysis showsthat A depends primarily upon F,, and F and is fairly insensitive to thebandwidths of all the formants and to the higher formant frequencies.FIG. 9 shows a curve of the minimum difference in formant level (in'db.) between F, and F, as the function of the frequency F ;This curvetakes into account equalization of the spectrum and serves as athreshold against which the difference betweenthe level of a possible Fpeak and the level of an F, peak is; ;compared. The dependence of A onF, is eliminated by as !suming that F, is fixed at its lower limit FIMN.If the F, depen- ;dence were to be accounted for, a family of curvessimilar in shape but displaced vertically from the one shown in FIG. 9is required. For a value of F, greater than FIMN, the cor-, respondingcurve is above the curve shown in FIG. 9. In FIG. :9, the curve is fiatuntil 500 Hz. because F is assumed to be above this minimum value. Thecurve then decreases until about 1,500 Hz., reflecting the drop in Flevel as it gets further away from F,. However, above 1,500 Hz. thecurve rises again due to the increasing proximity of F and F The curvecontinues to rise until F gets to its maximum value F2MX 2,700 H2., atwhich point F and F are maximally close (according to the simple modeloffixed F In order to estimate formants from the spectrum envelope, allpeaks are located and the frequency and amplitude of each peak isrecorded. The frequency region of the applied signal is segmented intothree regions not unlike those depicted in FIG. 8. The lowest formant isfirst searched for, then F and finally F Based on the amplitudes andfrequencies of the peaks, spuirious candidates are eliminated andambiguities resulting, for 'example, from closely spaced formants areeliminated by a logical examination of the detected peaks.

In cases where F,, F and F are separated by more than about 300 l-lz.,there is no difficulty in resolving the corresponding peaks in thesmoothed spectrum. However, when F, and F or when F and F get closerthan about 300 Hz. the cepstral smoothing results in the peaks not beingresolved. In these cases, a spectral analysis algorithm called the ChirpTransform (CZT) can be used to advantage. The CZT permits thecomputation of samples of the z-transform at equally spaced intervalsalong a circular or spiral contour in the 2- plane. In particular, if F,and F are close together, it is possible to compute the z-transform on acontour which passes closer to the pole locations than the unit circlecontour, thereby enhancing the peaks in the spectrum and improving theresolution. For example, FIG. 10 shows a smoothed spectral envelope inwhich F, and F are unresolved. In this case the parameters of thecep'stral window function 1(nT), were 1, 2 msec. and Ar 2 msec. FIG. 11shows the results ofa CZT analysis along a circular contour of radius e'over the frequency range 0 to 900 Hz. with a resolution of about 10 Hz.The effect of the use of the contour which passes closer to the poles isevident in contrast to FIG. 10. A discussion of the CZT algorithm isgiven in The Chirp z-Transform Algorithm and Its Application," byRabiner, Schafer and Rader, Bell System Technical Journal, May-June1969, at p. 1249.

Voiced spectrum coder 19, supplied with peaks of the spectral envelopeduring voiced speech intervals from gate 17 and with cepstrum signalsC(nT) from analyzer 12, is accordingly programmed to take thesecharacteristics of voiced speech into account. It serves to derivecontrol signals F,, F F 3 which specify formant frequencies and whichare sufficient for controlling voiced resonant circuits 27 at asynthesizer. Again, the logical operations performed on the cepstrum andpeak signals may be carried out using any desired form of apparatus. Inpractice, however, it has been found most convenient to employ acomputer programmed in accordance with the steps set forth in the flowchart of FIGS. 6 and 7. Program listings for the steps of the flow chartappears in Appendix ll of this specification.

Referring to FIGS. 6 and 7, the formants are picked in sequencebeginning with F,. To start the process, the highest level peak of thespectrum from the peak picker I6 in the frequency range 0 to FIMX isrecorded as FOAMP. FIMX is the upper limit of the F, region. Generallythe value FOAMP will occur at a peak in the F, region which willultimately be chosen as the F, peak. However, sometimes there is anespecially strong peak below FIMN, the lower limit of the F, re- 'glT,which is due to the spectrum of the glottal sou rce waveform. In suchcases there may or may not be a clearly resolved F, peak above FIMN. Inorder to avoid choosing a low level spurious peak or possibly the F peakfor the F, peak, Iwhen in fact the F, peak and peak due to the sourceare not resolved, a peak in the F, region is required to be less than8.7 db. (1.0 on a natural log scale) below FOAMP to be considered as apossible F, peak. The frequency of the highest level peak in the F,region which exceeds this threshold is selected as the first formant,F,. The level of this peak is recorded as FIAMP. If no F, can beselected this way, the spectral envelope in the region 0 to 900 Hz. isreevaluated. The spectral peaks are sharpened by weighting the cepstrum,

.c(nT), supplied to coder 19 directly from analyzer 12, with a window wln T), where WANT) l001'l'nT w i (8) and performing a spectral analysison the resultant. This has the effect of evaluating the spectrum on acontour which passes closer to the poles. As previously discussed, theCZT algorithm is an efficient way of performing this evaluation. Theenhanced section of the spectrum is then searched for the highest levelpeak in the F, region. The location of this peak is accepted as F,. Ifthe enhancement has failed to bring about a resolution of the sourcepeak and the F, peak, F, is arbitrarily :set equal to F IMN, the lowerlimit of the F, region.

The quantity FIAMP is used in the estimation of F,. If the F, peak isvery low in frequency and is not clearly resolved from the lowerfrequency peak due to the glottal waveform, FIAMP is set equal to (FOAMP8.7 db.). This is done effec- -ztively to lower (because F, is very low)the threshold which is used in searching for F The first step inestimating F is to fix the frequency range to be searched. If F has beenestimated to be less than FZMN, the lower limit of the F region, thenonly the region from F2MN to FZMX is searched. However, if F, has beenestimated to be greater than FZMN, it is possible that the F peak has infact been chosen as the F, peak. Therefore the combined F,-F region fromFlMN to F2MX is searched to ensure that if this is the case, the F, peakwill be found as the F peak. After F has been estimated, F, and F arecompared and their values are interchanged if F is less than F,

In deciding whether a particular spectral peak under investigation is apossible candidate for an F peak, the threshold curve of FIG. 9 is used.The spectral peak is first checked to see ifit is located in the properfrequency range. If so. the difference between the level of the peakunder consideration and FIAMP is computed. If this difference exceedsthe threshold of FIG. 9, that peak is a possible F peak; if not, thatpeak is not considered as a possible F peak. The value of F is chosen tobe the frequency of the highest level peak to exceed the threshold. Thelevel of this peak is recorded as FZAMP.

If no peaks are found which exceeded the threshold, further analysis iscalled for. The fact that no peaks are located has been found to be areliable indication that F, and F are close together. Therefore thecepstrum is multiplied by the weighting function w,(nT) and a highresolution, narrow band spectrum is computed over the frequency range (F-450) Hz.

to (F,I450) Hz. (If F, 450 Hz. the range is to 900 Hz). This spectrum isevaluated along a circular arc of radius e' in the z-plane. Thisanalysis generally produces a spectrum such as shown in FIG. 11 in whichthe two formants F, and F are readily apparent.

The value of F, is reassigned as the frequency of the highest level peakin the F, region and F is the frequency of the next highest peak. Ifonly one peak is found. F, is arbitrarily set equal to the frequency ofthat peak and F: (F,+200) Hz.

In searching for F;,, a threshold on the difference in level between apossible F peak and the F peak is employed. In this case a fixed,frequency-independent, threshold has been found satisfactory. lf F islocated without weighting the cepstrum with the w,(n T) function, (i.e.,F is not extremely low),

the threshold o the difference is set at l7 .3 db. (2. O o a natural logscale). Otherwise, the threshold is effectively removed b y setting itat l ,000 db. l 7

The estimation of F from the smoothed spectrum is then carried out.Because of equalization, there is a possibility of finding the F peak asF Thus, F is checked to see if it is greater than F3MN, the lower limitof the F region. If so, the search for F is extended to cover thecombined F -F;, region from FZMN to F3MX. Otherwise the frequency regionF3MN to F3MX is searched. As before, a spectral peak is first checked tosee if it is in the correct frequency range. Then the difference betweenthe level of the peak being considered for an F peak and F2AMP iscomputed. The highest level peak which exceeds the threshold is chosenas the F peak. If no peak is found for F further analysis is againcalled for. It has been found that this situation is generally due to Fand F being very close together. As before, an enhanced spectrum iscomputed by multiplying the cepstrum by window function w,(nTandperforming a spectrum analysis on the resultant, in this case over thefrequency range (F 450) Hz. to (F +45O) Hz. The result is normally aspectrum similar to that shown in FIG. 11, where F and F are clearlyresolved. F is chosen to be the frequency of the highest peak and F tobe the frequency of the next highest peak. If only one peak is found,that peak is arbitrarily called the F peak and F is set to (F d-200) Hz.(This may sometimes result in estimates of both F and F which areslightly high.). The final step in the process is to compare F and F andinterchange their values if F is greater than F The arrangement forestimating the three lowest formant frequencies of voiced speech, i.e.,F,, F F has been found to perform well on vowels, glides, andsemivowels. Although no attempt is made to deal with voiced stopconsonants or nasal consonants, experience has shown that extremelynatural sounding synthetic speech nevertheless may be produced with thelimited class of control signals employed in this invention.Advantageously, the control signals may be stored or transmitted withgreatly limited channel capacity, thus to achieve substantial economies.

Variations and modifications of the system described herein will occurto those skilled in the art.

n n (f? 3,649,765 7 M19 20 FORTRAN SUBROUTINE FOR ENHANCING FORMAMTS ANDPICKING PEAKS SUBROUTINE ENHANQQXQNLCPQWRQWIQFOFAQFBOYQSOQFMN) DIMENSION9(1) 0X1) QWRI) sWI(1) QYUJ DIMENSI N PLOCX(20) PAMPX(20) INTEGERENVOMEGOZOQ CALL ZERCHYolvlZB) CALL. CDPYQNLCPQQQXY CALL CT(X9YJNLCPQNOPTSQDSIGQDOMGQWROw!QSOOOMEGOQO) CZT IS A SUBROUTINE FORSPECTRAL ANALYSIS WHICH IS BASED UN THE PRINCIPLES SET FORTH IN RABINER'SCHAFERe AND RADERQ BSTJv MAY-JUNEo 1969 DO 5 Y= q2 PLQCX(I )2000PAMPX(I 2000 CALL PKFINDNDONDIQNOPTSQXVPLOCX'PAMPXODOMG) PRI T 1 a LOCX(I) QPAMPXI) o 1:1.Q) FORMAT(2F12@5) CALLPICK4TFAQFMNOQFMXOQPLQCXQPAMPXQTHR'AMPQO) CALL PICK(TFBQFMNOQFMXOQPLOCXvAMPXvTHR0AMP! 0) IF(TFBeEQe0oD) Go To 500 IF(TFAOLT TFB) GO TO 2000TzTFA TFA- -TFB GO TO 2000 CONTINUE TFB TFAQQUOQ CONTINUE FA=TFA+OMEGOFBZTFBHDMEGO WNW CONTINUE CONTINUE FAMPzPKAMPULDC) F=TLOC RETURN ENDFORTRAN SUBPOUTINE FOR GROSS PEAK SEARCH SUBROUTINEGRGSPMNLeNUoNDeNDlvTABvNloNZ) DIMENSION TAFNZ) ND2=ND/2 DO 10I=NLvNUoNU2 I1=I-ND1 SL1=TABU TAFHIl) SL2=TAB(I3)='TAB( 12)IHSLIQGEOQDBANDQSLZGLEQOQM GO TO 20 CONTINUE GO TO 30 CONTINUE IF(SLlwEQeOeO) IHSLMEQQOM) N2=I+2*ND CONTINUE RETURN END FOR?RANSUBROUTINE FOR FINDING THE BIGGEST PEAK BETWEEN N1&N2

SUBROUTINE FINEPKN].9N2QPKLOCIPKAMPOTAB DIMENSION PKLOCKI) QPKAMPKI@TABU.)

pmmpviooooo PKLOCzNi D0 10 I=N1 eN2 TMP=TAB I) IF(TMPQL.EQPKAMP GO TO 10PKAMP=T P CONTINUE RETUR END Imam/Lease; so To 3000 CALLsPc'rENmoptstoomsevtx xmonso.0i

3000 CONTINUE RF'TURN END $ FORTRAN SUBROUTINEZERO(TABONL'WNU) 3 FORTRAC SUBROUTINE FOR COPYING TABLES SUBROUTINE DO 10 Z 1 3N T1182 T)=TL\BlI) CONTINUE RETURN END means responsive to said peak representativesignals for selecting as formants of said speech signal the highestamplitude peaks according to location within said ranges.

2. Speech analysis apparatus for locating formants of voiced speechsignals, which comprises:

means for developing a signal representative of the cepstrum of anapplied speech signal,

means for developing from said cepstrum signal a signal representativeof the spectral envelope of said speech signal,

means for evaluating said spectral envelope signal along a contour closeto the pole locations in the complex frequency plane thereby to producea signal in which spectrum peaks are sharpened,

means responsive both to said spectral envelope signal and selectivelyto said cepstrum signal for developing signals representative of thelocation and amplitude of all peaks in said spectral envelope signal,

means responsive to said peak location signal for selecting and orderingin frequency the highest of said amplitude peaks, and

means for identifying said selected and ordered peak location signals asformants of said applied signal.

3. Speech analysis apparatus for locating formants of a voiced speechsignal, which comprises:

means for developing a signal representative of the smoothed spectralenvelope of an applied speech signal,

COPYiNtTABl @TABZ) DIMENSIQN TAR]. l l H'ABZi 1) means for locating allpeaks in said spectral envelope signal,

means for developing signals representative of the location andamplitude of each of said located peaks within assigned frequencyranges, said ranges being selected to encompass a selected frequencyrange of said applied signal with prescribed segments of overlap,

means responsive to said peak location signals for selecting the highestamplitude peak in each of said ranges,

means for identifying as formants of said applied signal said selectedpeaks which occur in nonoverlapping segments of said ranges, and

means for identifying as formants of said applies signal the highestamplitude peaks according to their location, which occur in overlappingsegments of said ranges.

4. Apparatus as defined in claim 3, in combination with,

spectral analysis means for enhancing said peaks in said spectralenvelope signal.

5. Speech analysis apparatus for locating formants of voiced speechsignals, which comprises:

means for developing a signal representative of the pitch period of anapplied speech signal,

means for selectively weighting said applied speech signals with asymmetric window function of said pitch period signal,

means supplied with said weighted speech signal for developing a signalrepresentative of the smoothed spectral envelope of said applies speechsignal,

means for locating all peaks in said spectral envelope signal,

means for developing signals representative of the location andamplitude of each of said located peaks within assigned frequencyranges,

means responsive to said peak location signals for selecting the highestamplitude peak in each ofsaid ranges,

means for identifying as formants of said applied signal said selectedpeaks which occur in nonoverlapping segments of said ranges, and

means for identifying as formants of said applied signal the highestamplitude peaks according to their location, which occur in overlappingsegments of said ranges.

6. Apparatus as defined in claim wherein said applied speech signals areweighted with a window function with a duration of approximately threetimes the pitch period of said applied speech signals.

8. A speech signal analyzer system for producing coded signals fromapplied speech signals, which comprises:

means for developing a signal representative of the smoothed spectrum ofan applied speech signal,

means for locating all peaks in said spectrum,

means responsive to said located peaks and selectively to said spectrumfor developing, during voiced intervals of said speech signal, controlsignals representative of the location of the highest of said spectrumpeaks in a prescribed order as formants of said applied signal,

means responsive to said spectrum for developing control signalsrepresentative of the level of said applied signal during voiced andunvoiced intervals, respectively,

means for developing a signal representative of the cepstrum of saidapplied speech signal,

means responsive to a count of zero axis crossings in said appliedsignal and to said cepstrum for developing a signal representative ofthe voicing character of said applied signal and the pitch period ofvoiced intervals thereof,

means responsive to said peak signals for developing a signalrepresentative of the pole and zero locations for unvoiced intervals ofsaid applied signal, and

means for utilizing all of said developed signals as a codedrepresentation of said applied speech signal.

9, A speech signal analyzer-synthesizer system with reduced channelbandwidth requirements, which comprises:

at an analyzer station,

means for developing a signal representative of the smoothed spectrum ofan applied speech signal,

means for locating all peaks in said spectrum,

means responsive to an indication of said located peaks and selectivelyto said spectrum signal for developing, during voiced intervals of saidspeech signal, control signals representative of the location of thehighest of said amplitude peaks in a prescribed order as formants ofsaid applied signal,

means responsive to said spectrum signal for developing signalsrepresentative of the level of said applied signal during voiced andunvoiced intervals, respectively,

means for developing a signal representative of the cepstrum of saidapplied signal,

means responsive to a count of zero axis crossings in said appliedsignal and to said cepstrum signal for developing signals representativeof the voicing character of said applied signal and the pitch period ofvoiced intervals thereof,

means responsive to said peak signals for developing signalsrepresentative of the pole and zero locations for unvoiced intervals ofsaid applied signal, and

means responsive to all of said developed signals for delivering them toa synthesizer station, and

at said synthesizer station,

means responsive to received unvoiced level control signals foradjusting the level of a source of noise signals,

a system of unvoiced resonant circuits energized by said adjusted noisesignals,

means for adjusting said resonant system with said pole and zerolocation signals to produce an unvoiced signal,

generator means responsive to said pitch period control signal fordeveloping pulses at pitch frequency,

means for adjusting the amplitude of said pulses according to said levelcontrol signal during voiced signals of said applied signal,

a system of resonant circuits energized by said control pulse signalsand by said formant signals to produce a voiced signal,

means for combining said voiced and unvoiced signals,

means for shaping the spectrum of said combined signal,

and

means for utilizing said shaped spectrum signal as a replica of saidapplied speech signal.

1. Speech analysis apparatus for locating formants of a voiced speechsignal, which comprises: means supplied with a speech signal fordeveloping a signal representative of a smoothed spectral envelopethereof, means supplied with said spectral envelope signal fordeveloping signals representative of the location and amplitude of peakswithin assigned frequency ranges in said speech signal, said rangesbeing selected to encompass a prescribed frequency range of said speechsignal with predetermined segments of overlap and means responsive tosaid peak representative signals for selecting as formants of saidspeech signal the highest amplitude peaks according to location withinsaid ranges.
 2. Speech analysis apparatus for locating formants ofvoiced speech signals, which comprises: means for developing a signalrepresentative of the cepstrum of an applied speech signal, means fordeveloping from said cepstrum signal a signal representative of thespectral envelope of said speech signal, means for evaluating saidspectral envelope signal along a contour close to the pole locations inthe complex frequency plane thereby to produce a signal in whichspectrum peaks are sharpened, means responsive both to said spectralenvelope signal and selectively to said cepstrum signal for developingsignals representative of the location and amplitude of all peaks insaid spectral envelope signal, means responsive to said peak locationsignal for selecting and ordering in frequency the highest of saidamplitude peaks, and means for identifying said selected and orderedpeak location signals as formants of said applied signal.
 3. Speechanalysis apparatus for locating formants of a voiced speech signal,which comprises: means for developing a signal representative of thesmoothed spectral envelope of an applied speech signal, means forlocating all peaks in said spectral envelope signal, means fordeveloping signals representative of the location and amplitude of eachof said located peaks within assigned frequency ranges, said rangesbeing selected to encompass a selected frequency range of said appliedsignal with prescribed segments of overlap, means responsive to saidpeak location signals for selecting the highest amplitude peak in eachof said ranges, means for identifying as formants of said applied signalsaid selected peaks which occur in nonoverlapping segments of saidranges, and means for identifying as formants of said applies signal thehighest amplitude peaks according to their location, which occur inoverlapping segments of said ranges.
 4. Apparatus as defined in claim 3,in combination with, spectral analysis means for enhancing said peaks insaid spectral envelope signal.
 5. Speech analysis apparatus for locatingformants of voiced speech signals, which comprises: means for developinga signal representative of the pitch period of an applied speech signal,means for selectively weighting said applied speech signals with asymmetric window function of said pitch period signal, means suppliedwith said weighted speech signal for developing a signal representativeof the smoothed spectral envelope of said applies speech signal, meansfor locating all peaks in said spectral envelope signal, means fordeveloping signals representative of the location and amplitude of eachof said located peaks within assigned frequency ranges, means responsiveto said peak location signals for selecting the highest amplitude peakin each of said ranges, means for identifying as formants of saidapplied signal said selected peaks which occur in nonoverlappingsegments of said ranges, and means for identifying as formants of saidapplied signal the highest amplitude peaks according to their location,which occur in overlapping segments of said ranges.
 6. Apparatus asdefined in claim 5 wherein said applied speech signals are weighted witha window function with a duration of approximately three times the pitchperiod of said applied speech signals.
 7. Apparatus for analyzing speechfrequency signals, which comprises: means for counting the zero axiscrossings of an applied speech signal, means for developing a signalrepresentative of the cepstrum of said speech signal, and meansresponsive to said zero crossing count and to said cepstrum signal fordetermining therefrom the voiced-unvoiced character of said speechsignal and, if voiced, the pitch period of said signal.
 8. A speechsignal analyzer system for producing coded signals from applied speechsignals, which comprises: means for developing a signal representativeof the smoothed spectrum of an applied speech signal, means for locatingall peaks in said spectrum, means responsive to said located peaks andselectively to said spectrum for developing, during voiced intervals ofsaid speech signal, control signals represeNtative of the location ofthe highest of said spectrum peaks in a prescribed order as formants ofsaid applied signal, means responsive to said spectrum for developingcontrol signals representative of the level of said applied signalduring voiced and unvoiced intervals, respectively, means for developinga signal representative of the cepstrum of said applied speech signal,means responsive to a count of zero axis crossings in said appliedsignal and to said cepstrum for developing a signal representative ofthe voicing character of said applied signal and the pitch period ofvoiced intervals thereof, means responsive to said peak signals fordeveloping a signal representative of the pole and zero locations forunvoiced intervals of said applied signal, and means for utilizing allof said developed signals as a coded representation of said appliedspeech signal.
 9. A speech signal analyzer-synthesizer system withreduced channel bandwidth requirements, which comprises: at an analyzerstation, means for developing a signal representative of the smoothedspectrum of an applied speech signal, means for locating all peaks insaid spectrum, means responsive to an indication of said located peaksand selectively to said spectrum signal for developing, during voicedintervals of said speech signal, control signals representative of thelocation of the highest of said amplitude peaks in a prescribed order asformants of said applied signal, means responsive to said spectrumsignal for developing signals representative of the level of saidapplied signal during voiced and unvoiced intervals, respectively, meansfor developing a signal representative of the cepstrum of said appliedsignal, means responsive to a count of zero axis crossings in saidapplied signal and to said cepstrum signal for developing signalsrepresentative of the voicing character of said applied signal and thepitch period of voiced intervals thereof, means responsive to said peaksignals for developing signals representative of the pole and zerolocations for unvoiced intervals of said applied signal, and meansresponsive to all of said developed signals for delivering them to asynthesizer station, and at said synthesizer station, means responsiveto received unvoiced level control signals for adjusting the level of asource of noise signals, a system of unvoiced resonant circuitsenergized by said adjusted noise signals, means for adjusting saidresonant system with said pole and zero location signals to produce anunvoiced signal, generator means responsive to said pitch period controlsignal for developing pulses at pitch frequency, means for adjusting theamplitude of said pulses according to said level control signal duringvoiced signals of said applied signal, a system of resonant circuitsenergized by said control pulse signals and by said formant signals toproduce a voiced signal, means for combining said voiced and unvoicedsignals, means for shaping the spectrum of said combined signal, andmeans for utilizing said shaped spectrum signal as a replica of saidapplied speech signal.